Sentence Analysis and Collocation Identification
نویسندگان
چکیده
Identifying collocations in a sentence, in order to ensure their proper processing in subsequent applications, and performing the syntactic analysis of the sentence are interrelated processes. Syntactic information is crucial for detecting collocations, and vice versa, collocational information is useful for parsing. This article describes an original approach in which collocations are identified in a sentence as soon as possible during the analysis of that sentence, rather than at the end of the analysis, as in our previous work. In this way, priority is given to parsing alternatives involving collocations, and collocational information guide the parser through the maze of alternatives. This solution was shown to lead to substantial improvements in the performance of both tasks (collocation identification and parsing), and in that of a subsequent task (machine translation).
منابع مشابه
An Algorithm Combining Statistics-based and Rules-based for Chunk Identification of Chinese Sentences
Natural language processing (NLP) is a very hot research domain. One important branch of it is sentence analysis, including Chinese sentence analysis. However, currently, no mature deep analysis theories and techniques are available. An alternative way is to perform shallow parsing on sentences which is very popular in the domain. The chunk identification is a fundamental task for shallow parsi...
متن کاملParsing and MWE Detection: Fips at the PARSEME Shared Task
Identifying multiword expressions (MWEs) in a sentence in order to ensure their proper processing in subsequent applications, like machine translation, and performing the syntactic analysis of the sentence are interrelated processes. In our approach, priority is given to parsing alternatives involving collocations, and hence collocational information helps the parser through the maze of alterna...
متن کاملViewing sentence boundary detection as collocation identification
The detection of abbreviations is an important step in the process of sentence boundary detection. We describe a flexible, languageindependent and accurate method based on the idea that an abbreviation can be viewed as a collocation. As such, it can be identified by using methods for collocation detection such as the log likelihood ratio. Although the log likelihood ratio is known to show a goo...
متن کاملSentence Compression for Target-Polarity Word Collocation Extraction
Target-polarity word (T-P) collocation extraction, a basic sentiment analysis task, relies primarily on syntactic features to identify the relationships between targets and polarity words. A major problem of current research is that this task focuses on customer reviews, which are natural or spontaneous, thus posing a challenge to syntactic parsers. We address this problem by proposing a framew...
متن کاملCreating a Multilingual Collocation Dictionary from Large Text Corpora
This paper describes a system of terminological extraction capable of handling multi-word expressions, using a powerful syntactic parser. The system includes a concordancing tool enabling the user to display the context of the collocation, i.e. the sentence or the whole document where the collocation occurs. Since the corpora are multilingual, the system also offers an alignment mechanism for t...
متن کامل